Security Management in the Cloud - PaaS Availability Management

12/4/2010 3:26:45 PM

In a typical PaaS service, customers (developers) build and deploy PaaS applications on top of the CSP-supplied PaaS platform. The PaaS platform is typically built on a CSP owned and managed network, servers, operating systems, storage infrastructure, and application components (web services). Given that the customer PaaS applications are assembled with CSP-supplied application components and, in some cases, third-party web services components (mash-up applications), availability management of the PaaS application can be complicated—for example, a social network application on the Google App Engine that depends on a Facebook application for a contact management service. In that mashed-up software deployment architecture the onus of availability management is shared between the customer and the CSP. The customer is responsible for managing the availability of the customer-developed application and third-party services, and the PaaS CSP is responsible for the PaaS platform and any other services supplied by the CSP. For example, Force.com is responsible for the management of the AppExchange platform, and customers are responsible for managing the applications developed and deployed on that platform.

By design, PaaS applications may rely on other third-party web services components that are not part of the PaaS service offerings; hence, understanding the dependency of your application on third-party services, including services supplied by the PaaS vendor, is essential (e.g., your web 2.0 application using Google Maps for geo mapping). PaaS providers may also offer a set of web services, including a message queue service, identity and authentication service, and database service, and your application may depend on the availability of those service components (an example is Google’s BigTable). Hence, your PaaS application availability depends on the robustness of your application, the PaaS platform on which the application is built, and third-party web services components.

Customers are encouraged to read and understand the PaaS platform service levels (if available), including quota triggers that may limit resource availability for their application (usually outlined in the SLA, or in the terms and conditions of the PaaS service). In cases where the PaaS platform enforces quotas on compute resources (CPU, memory, network I/O), upon reaching the thresholds the application may not be able to respond within the normal latency expectations and could eventually become unavailable. For example, the Google App Engine has a quota system whereby each App Engine resource is measured against one of two kinds of quotas: a billable quota or a fixed quota.

Billable quotas are resource maximums set by you, the application’s administrator, to prevent the cost of the application from exceeding your budget. Every application gets an amount of each billable quota for free. You can increase billable quotas for your application by enabling billing, setting a daily budget, and then allocating the budget to the quotas. You will be charged only for the resources your app actually uses, and only for the amount of resources used above the free quota thresholds.

Fixed quotas are resource maximums set by the App Engine to ensure the integrity of the system. These resources describe the boundaries of the architecture, and all applications are expected to run within the same limits. They ensure that another app that is consuming too many resources will not affect the performance of your app.

You can find details on App Engine quotas at http://code.google.com/appengine/docs/quotas.html.

Another example is Force.com’s Apex governor feature. Because the Apex application runs in a multitenant environment, the Apex runtime engine strictly enforces a number of limits to ensure that runaway scripts do not monopolize shared resources. Governors track and enforce the limits based on a policy shared with customers. If a script ever exceeds a limit, the associated governor issues a runtime exception that cannot be handled.

1. Customer Responsibility

Considering all of the variable parameters in availability management, the PaaS application customer should carefully analyze the dependencies of the application on the third-party web services (components) and outline a holistic management strategy to manage and monitor all the dependencies.

The following considerations are for PaaS customers:

PaaS platform service levels: Customers should carefully review the terms and conditions of the CSP’s SLAs and understand the availability constraints.
Third-party web services provider service levels: When your PaaS application depends on a third-party service, it is critical to understand the SLA of that service. For example, your PaaS application may rely on services such as Google Maps and use the Google Maps API to embed maps in your own web pages with JavaScript.
Network connectivity parameters for the network (Internet)-connecting PaaS platform with third-party service providers: The parameters typically include bandwidth and latency factors.

2. PaaS Health Monitoring

In general, PaaS applications are always web-based applications hosted on the PaaS CSP platform (e.g., your Java or Python application hosted on the Google App Engine). Hence, most of the techniques and processes used for monitoring a SaaS application also apply to PaaS applications. Given the composition of PaaS applications, customers should monitor their application, as well as the third-party web component services. Configuring your management tools to monitor the health of web services will require the knowledge of the web services protocol (HTTP, HTTPS) and the required protocol parameters (e.g., URI) to verify the availability of the service.

When CSPs support monitoring via application programming interfaces (APIs), monitoring your application can involve a standard web services protocol, such as Representational State Transfer (REST), Simple Object Access Protocol (SOAP), eXtensible Markup Language/Hypertext Transfer Protocol (XML/HTTP), and in a few cases, proprietary protocols.

The following options are available to customers to monitor the health of their service:

Service health dashboard published by the CSP (e.g., http://status.zoho.com)
CCID (this database is generally community-supported, and may not reflect all CSPs and all incidents that have occurred)
CSP customer mailing list that notifies customers of occurring and recently occurred outages
RSS feed for RSS readers with availability and outage information
Internal or third-party-based service monitoring tools that periodically check your PaaS application, as well as third-party web services that monitor your application (e.g., Nagios monitoring tool)